Goto

Collaborating Authors

 nonlinear acceleration


Nonlinear Acceleration of Stochastic Algorithms

Neural Information Processing Systems

Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains.


Nonlinear Acceleration of Stochastic Algorithms

Neural Information Processing Systems

Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm.


Reviews: Nonlinear Acceleration of Stochastic Algorithms

Neural Information Processing Systems

The paper extends recent work of Scieur et al [2016] on nonlinear acceleration via extrapolation of sequences from deterministic to stochastic optimization. The work by Scieur itself generalizes and extends results developed in the late 60s and 70s from quadratics to non-quadratics (whence the name "nonlinear"). Sequence extrapolation methods seem to have been "forgotten" or simply "not in use" by the ML and optimization community until recently, and have some interesting theoretical and practical properties. For instance, nonlinear regularized acceleration (NRA) is capable to accelerate the sequence of iterates formed by the gradient descent method and obtain the optimal accelerated rate. This is done via what essentially amounts to a "bootstrapping" extrapolation process.


Nonlinear Acceleration of Stochastic Algorithms

Neural Information Processing Systems

Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains.


Riemannian accelerated gradient methods via extrapolation

Han, Andi, Mishra, Bamdev, Jawanpuria, Pratik, Gao, Junbin

arXiv.org Artificial Intelligence

Optimization on a Riemannian manifold naturally appears in various fields of applications, including principal component analysis [22, 61], matrix completion and factorization [35, 56, 13], dictionary learning [17, 27], optimal transport [49, 40, 26], to name a few. Riemannian optimization [2, 12] provides a universal and efficient framework for problem (1) that respects the intrinsic geometry of the constraint set. In addition, many non-convex problems turns out to be geodesic convex (a generalized notion of convexity) on the manifold, which yields better convergence guarantees for Riemannian optimization methods. One of the most fundamental solvers is the Riemannian gradient descent method [55, 62, 2, 12], which generalizes the classical gradient descent method in the Euclidean space with intrinsic updates on manifolds. There also exist various advanced algorithms for Riemannian optimization that include stochastic and variance reduced methods [11, 61, 34, 24, 25], adaptive gradient methods [8, 33] quasi-Newton methods [30, 43], trust region methods [1], and cubic regularized Newton methods [3], among others. Nevertheless, it remains unclear whether there exists a simple strategy to accelerate firstorder algorithms on Riemannian manifolds. Existing research on accelerated gradient methods focus primarily on generalizing Nesterov acceleration [42] to Riemannian manifolds, including [37, 4, 63, 6, 31, 36]. However, most of the algorithms are theoretic constructs and are usually less favourable in practice.


Nonlinear Acceleration of Stochastic Algorithms

Scieur, Damien, Bach, Francis, d', Aspremont, Alexandre

Neural Information Processing Systems

Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains.


Nonlinear Acceleration of CNNs

Scieur, Damien, Oyallon, Edouard, d'Aspremont, Alexandre, Bach, Francis

arXiv.org Machine Learning

The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training.


Nonlinear Acceleration of Deep Neural Networks

Scieur, Damien, Oyallon, Edouard, d'Aspremont, Alexandre, Bach, Francis

arXiv.org Machine Learning

Regularized nonlinear acceleration (RNA) is a generic extrapolation scheme for optimization methods, with marginal computational overhead. It aims to improve convergence using only the iterates of simple iterative algorithms. However, so far its application to optimization was theoretically limited to gradient descent and other single-step algorithms. Here, we adapt RNA to a much broader setting including stochastic gradient with momentum and Nesterov's fast gradient. We use it to train deep neural networks, and empirically observe that extrapolated networks are more accurate, especially in the early iterations. A straightforward application of our algorithm when training ResNet-152 on ImageNet produces a top-1 test error of 20.88%, improving by 0.8% the reference classification pipeline. Furthermore, the code runs offline in this case, so it never negatively affects performance.


Nonlinear Acceleration of Stochastic Algorithms

Scieur, Damien, Bach, Francis, d', Aspremont, Alexandre

Neural Information Processing Systems

Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains.